Add TinyOpenFold: GPU Optimization Tutorial with AlphaFold 2 Evoformer by asitav · Pull Request #164 · amd/HPCTrainingExamples

asitav · 2026-05-29T22:55:55Z

Summary

This PR adds TinyOpenFold, a comprehensive educational example demonstrating GPU optimization techniques on AMD GPUs. The tutorial progressively implements an AlphaFold 2 Evoformer architecture from baseline PyTorch to custom Triton kernels.

Key Features

Three Progressive Optimization Stages:
- V1 (Baseline): Clean PyTorch implementation
- V2 (Kernel Fusion): PyTorch-level optimizations with kernel fusion
- V3 (Custom Triton Kernels): Hand-optimized GPU kernels using Triton
Comprehensive Profiling Integration:
- PyTorch Profiler for high-level bottleneck identification
- rocprof-sys for system-level GPU traces and kernel timelines
- rocprofv3 for detailed kernel metrics and launch counts
- rocprof-compute for hardware counter analysis and memory bandwidth
Complete Educational Pipeline:
- Step-by-step optimization tutorial with detailed explanations
- Ablation studies showing individual optimization contributions
- Performance analysis with bottleneck decomposition
- ROCm profiling tool integration at each stage

What's Included

MLExamples/TinyOpenFold/
├── README.md                              # Main documentation
├── ARCHITECTURE.md                        # Evoformer architecture details
├── PERFORMANCE_OPTIMIZATION_TUTORIAL.md   # Complete optimization guide
├── optimization_tutorial.sh               # Automated tutorial script
├── version1_pytorch_baseline/            # Baseline PyTorch (V1)
│   ├── tiny_openfold_v1.py
│   ├── run_deepspeed_flops.sh           # FLOPs analysis
│   └── FLOPS_ANALYSIS.md                # FLOPs profiling guide
├── version2_pytorch_fused/               # Kernel fusion (V2)
│   ├── tiny_openfold_v2.py
│   ├── run_rocprofv3.sh                 # Kernel profiling
│   ├── run_rocprof_sys.sh               # System profiling
│   └── run_rocprof_compute.sh           # Hardware counters
└── version3_triton/                      # Custom kernels (V3)
    ├── tiny_openfold_v3.py
    ├── launch_performance_study.sh      # V1/V2/V3 comparison
    └── profiling scripts

Problem Sizes

The tutorial demonstrates optimization across different problem sizes:

Small: 64 residues, 16 MSA sequences, batch size 4
Medium: 128 residues, 32 MSA sequences, batch size 2

Educational Value

This example teaches:

Systematic GPU optimization methodology
Profiling techniques (PyTorch Profiler + ROCm tools)
Kernel fusion strategies with PyTorch
Custom GPU kernel development with Triton
Performance analysis and bottleneck identification
Memory vs. speed trade-offs
AlphaFold 2 Evoformer architecture

Testing

Tested on AMD Instinct MI300X with ROCm 7.2
All three versions produce numerically identical outputs
Includes validation mode (--validate-setup)
Multi-GPU scaling tested (1, 2, 4, 8 GPUs)
Comprehensive profiling integration verified

Documentation

Complete optimization tutorial (PERFORMANCE_OPTIMIZATION_TUTORIAL.md) with step-by-step guide
Architecture documentation (ARCHITECTURE.md)
Version-specific READMEs for each implementation
Automated tutorial script (optimization_tutorial.sh)
ROCm profiling tool integration guide

Target Audience

ML engineers learning GPU optimization
Researchers working with protein structure prediction
Students studying AlphaFold 2 architecture
Developers optimizing deep learning workloads on AMD GPUs

…ling - Add automatic detection of rocpd package availability - Conditionally enable ROCPROFSYS_USE_ROCPD only if rocpd is found - Set ROCPROFSYS_CONFIG_FILE only if ~/.rocprof-sys.cfg exists - Add --trace flag to rocprof-sys-python command - Update help text with accurate configuration information

…H fixes

- Updated venv names from venvOF/venvOFr711 to simple venv - Changed ROCm module from 7.1.1 to 7.2 (PyTorch still uses ROCm 7.1 nightly) - V2 README now references main README for environment setup to avoid duplication - Updated requirements.txt with current dependencies

- Add pip install command for requirements_rocprof-compute-develop.txt in README.md - Include new requirements file for rocprof-compute development dependencies - Source: https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-compute/requirements.txt

Includes comprehensive tutorial docs and automated test script for demonstrating progressive optimization from baseline PyTorch to custom Triton kernels.

- Optimized FLOPS_ANALYSIS.md for conciseness - Removed redundant files and scaling scripts - Removed exercises directories from v2 and v3 - Updated documentation references

asitav and others added 30 commits November 5, 2025 18:14

First commit of TinyOpenFold.

9c6be67

Incorporate multi-GPU run. Add run scripts. Update documents.

a5faf10

Add gitignore for TinyOpenFold.

8141409

Fixed pytorch profiling option typo bug.

5eb16ff

Add DeepSpeed FLOPS profiling tools to openfold example.

9aa85d6

Add pytorch profiling python script.

ba5f13c

More clean ups in TinyOpenFold version1.

0fb40d8

Add kernel fusion optimized version2 to TinyOpenFold example.

4ae3db5

Fixed argument error in performance study script.

c415860

Added README file for version2 TinyOpenFold example. Also placeholder…

3d0db85

… exercises/ README file.

Added a sample performance study results wrt baseline.

ab13c58

Triton GPU kernel implementation of TinyOpenFold done.

186ac3e

Fixed issues with the performance study comparison script for triton …

0731831

…implementation.

Add a sample performance study example directory.

fd05c40

Clean ups.

a406b8e

Updates on rocprofv3 profiling commands.

41bec11

Fixed cuda_time_total() error issue.

36c8337

Add throughput info in pytorch profiling script. More clean ups.

6f11c1e

More clean ups of rocprofv3 script.

ff37c9e

Add more user options and introduce shorter user option names.

6de32b8

More clean ups of rocprofv3 script.

78928f0

Merge branch 'amd:main' into tiny_openfold

3cf5abb

Ensure profile directory is present before run profile info is saved.

e42867a

Add environment setup and dependency installations for TinyOpenFold.

8e222b6

Add accuracy tests for fusion implementation.

f0cd0e1

Updated TinyOpenFold Architecture notes.

8a504f2

Minor clean up.

6ce386a

Updated rocprofv3 profiling scripts for Triton implementation of Tiny…

43bff2e

…OpenFold.

Update install instructions for rocm/7.1.1.

a3d7c5c

Refactor profiling script for Tiny OpenFold V2 to use rocprof-sys-pyt…

23adf0b

…hon for Python call stack profiling. Updated default parameters for batch size and sequence length to optimize output size. Enhanced README with detailed usage instructions and output file descriptions.

asitav and others added 11 commits January 13, 2026 15:42

Updates for rocprof-sys run. Kernel traces are still not visible.

f3b7eb3

Update PyTorch installation to use nightly ROCm 7.1 pip repository

6148cb3

- Replace manual wheel downloads with pip install from nightly repository - Update requirements.txt with new PyTorch versions - Simplify installation process - Update run_rocprof_sys.sh file.

Update README, requirements.txt, and run_rocprof_sys.sh with ROCM_PAT…

46c36b6

…H fixes

Add GPU optimization tutorial: V1→V2→V3 achieving 2.0x speedup

5c40b08

Includes comprehensive tutorial docs and automated test script for demonstrating progressive optimization from baseline PyTorch to custom Triton kernels.

Minor edits of optimization tutorial of TinyOpenFold.

f371b5f

Added rocm profiling tools usage instructions TinyOpenFold tutorial.

6cff566

Merge branch 'amd:main' into tiny_openfold

e96d1b8

Clean up documentation and remove exercises

b073ac1

- Optimized FLOPS_ANALYSIS.md for conciseness - Removed redundant files and scaling scripts - Removed exercises directories from v2 and v3 - Updated documentation references

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TinyOpenFold: GPU Optimization Tutorial with AlphaFold 2 Evoformer#164

Add TinyOpenFold: GPU Optimization Tutorial with AlphaFold 2 Evoformer#164
asitav wants to merge 41 commits into
amd:mainfrom
asitav:tiny_openfold

asitav commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

asitav commented May 29, 2026

Summary

Key Features

What's Included

Problem Sizes

Educational Value

Testing

Documentation

Target Audience

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant